On the reliability of voting processes: the Mexican case 



G. Baez^, H. Hernandez-Saldana'^, R.A. Mendez-Sanchez'^ 

'^Lab. de Sistemas Dindmicos, Dpto. de Ciencias Bdsicas, Universidad Autonoma 
Metropolitana Azcapotzalco, Av. San Pablo 180, 02200, Mexico D.F., Mexico. 
^Instituto de Ciencias Fisicas, Universidad Nacional Autonoma de Mexico, A.P 4^-3, 

62210, Cuernavaca, Mor., Mexico. 



Abstract 

Analysis of vote distributions using current tools in statistical physics is of 
increasing interest. While data considered for physics studies are subject to 
a careful understanding of the error sources, however such analysis is almost 
absent in studies of voting process. We analyze the statistical properties of 
vote records, paying particular attention to correlations in real time and the 
distribution of errors. We use records which appeared in real time of the 
elections for president, deputies and senators in Mexico in 2006. The real- 
time signal does not appear in a random way, since it does not come from 
a random variable. Several self-consistency tests are applied to the records 
showing a mixed error distribution, but the sum of errors is around 50% in all 
cases (president, deputy and senator elections). Distribution of votes in all 
cases is obtained for all the parties. Parties and candidates with few votes, 
annulled votes and non-registered candidates follow a power law distribution, 
and the corporate party follows a daisy model distribution. Parties and 
candidates with many votes show a mixed behavior. Thus we show that the 
election, as a measurement process, has a statistical error larger than the 
difference between the two main presidential candidates. This result means 
that formally, under statistical criteria, the electoral process is not conclusive. 

Keywords: vote distribution, election, close elections, opinion polls, error 
analysis 
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1. Introduction 



The study of how we vote and several aspects of its dynamics has been of 
interest in the last years [Ball. 20031 . iBorghesi et al. 20061 . ICastellano et al. 2007 . 
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Costa Filho et al. 19991 ] . increased by the large amount of data currently 
available on the Internet. In this context, tools of statistical mechanics, com- 
plex systems and other fields o f physics and mathematics have been used t o 



understand the voting process porghesi et al 20061 . ICastellano et al 2007 



In an increasing number of cases, the advent of new technologies is changing 
the way in which elections are realized, not only the way that information 
appears in the mass media, but also the emergence of real-time data. Public 
access to partial results at the same time that votes are counted could be 
the rule in the future. In many processes the results are clear; however, in 
closed elections, some counterintuitive results appear. Understanding this 
behavior is important, since our experience with uncorrelated situations give 
us an insight that could be wrong when we deal with correlated variables. 

The faithfulness of the data is an important issue, since the record of social 
processes is subject to all kinds of uncontrollable factors. Furthermore, while 
data considered for analysis in physics are subject to a careful understanding 
of the sources of error, such analyses are almost absent in studies of voting 
process. 

In this work, we present an approach to both these problems: we analyze 
the real time results obtained during the federal election in Mexico in the 
year of 2006, and we offer an analysis of the distribution of errors obtainable 
from this public database. 

We focus our study on the data obtained from the "progr am of prelimi - 



nary electoral results" , or PREP, after its acronym in Spanish [PREP. 2006 



which is the system implemented by the electoral authorities (Instituto Fed- 
eral Electoral, IFE). In Mexico, the election is performed using electoral 
cabins (poling stations) that admit, by construction, 750 voters and that 
are, approximately, uniformly distributed over the whole population with 
the right to emit a vote. The PREP works with certificates that are stamped 
on the packets of ballots (named electoral packets). In those certificates the 
citizen authorities of each electoral cabin write the number of votes received 
for each party, total number of those votes, etc., at the end of the elec- 
toral day. Later, the authorities of each electoral cabin deliver the electoral 
packets and their certificates to the capture centers. The time of arrival is 
captured and the results stamped on the electoral packets. The records are 
sent to Mexico city headquarters and pu blished on the electoral authorities' 



website |http : //www, if e . org.mx. 2006| . The final results are recorded in 



a public file |PREP. 2006[ |. The analysis presented here is based on such a 



file. We label the political parties as PI, P2,. . ., P5, as they appear in the 
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Figure 1: (Color online) Real-time data given by the Program of Preliminary Electoral 
Results (PREP) for the three main presidential candidates. The plot shows the percentage 
of votes obtained per candidate against the percentage of processed vote certificates. The 
blue circles correspond to candidate of PI party, the green triangles to P2 party, and 
the orange squ ares to P3 party. The data were taken from the internet [PREP. 20061 



Mochan. 20061. We added 13% of votes to P2 in order to show all the results in a smaller 



window. For the parties and candidates keys, see Acronvms. 2009l |. 



database; the keys appear in reference Acronyms. 2009 . 

The paper is organized as follows: In the first section, we show the time 
behavior of the record that PREP yields and we analyze the reliability of the 
system. Using some simple conservation laws (self-consistency) , in Section [3l 
we make the error analysis of data which appear in the PREP file. In Sec- 
tion H] we show and analyze the distributions of votes for deputies, senators 
and president. A brief conclusion about the error of the election using the 
results of the PREP follows. 
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2. Time behavior of PREP 



Around 20:00 hrs on July 2nd, 20 06, the Program of Preliminary Electoral 
Results (PREP) of IFE |PREP. 2006| began to pubhsh on its website the vote 
percentages for the presidential election performed that day. The update was 
each 5 minutes and the news services reproduced it. A very close election, 
according to opinion polls, was e xpected between the two main candidates 



of PI and P3 parties. (See Ref. Acronyms. 2009l | for the key of the party 



names). At the beginning, the tendency showed a decrease in the number of 
votes for the Pi's candidate and an increase for the candidate of the party 
P3. At first sight, a crossing seemed to be imminent between the percentage 
of votes obtained by the two candidates. For this reason, several scientists as 
well as laymen started the capture of real-time data using different methods, 
som e captured by h and, others captured automatically with programs like 



perl [Mochan. 2006 



In Fig. [T] we show the plot of the real-time data given by the PREP 
for the three main candidates of P1-P3 parties. This plot shows roughly two 
tendencies for the vote percentage obtained by the candidate of P3 party. The 
increasing tendency changed to a decreasing one around 3:00 AM (~ 70% 
in Fig. [1]) of July 3rd. As can be seen in the same figure, the P2 candidate 
increased its vote percentage changing the tendencies of the candidates of PI 
and P3, respectively. 

As a result, no crossing between the vote percentage of the two main 
candidates was found with the PREP in real time. A common belief is to 
consider that the behavior of a graph like Figure [1] reaches its final value in a 
direct way, and a change like the presented here looks suspicious. However, 
one should be careful, since we cannot consider an a priori uncorrelated be- 
havior. The graph shows vote percentage and, evidently, this fact introduces 
correlations between the variables. Additionally, the temporal data are not 
taken from a uniform sample and so we do not expect a fast convergence to 
the final results. Assuming a clean election, the first data that arrived to 
the capture centers were from sites with b etter transpor t networks and/or a 
better vote counting performance. In Ref. Pliego. 2007l |. correlation with an 
official marginalization index is presented and confirm this assertion: Reports 
from polling stations inside regions of low marginalization index arrived early 
to capture centers. Here, we present examples of how the sampling of polling 
stations rules the shape of percentage vote graph. Evidently, the final values 
are the same. 
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0.1461 


1.0208 










P4 


-0.2572 


0.1708 


0.2205 


0.0786 








P5 


0.6982 


-0.9594 


-0.0754 


-0.1618 


1.0035 






I.C. 


0.0462 


0.0429 


-0.1234 


-0.0092 


-0.0034 


0.7969 




An. 


-0.9492 


0.8000 


0.6294 


0.2476 


-0.7781 


-0.1455 


1.0011 



Table 1: Correlation matrix of vote percentage. I.C. stands for independent candidates 
and An. for annulled votes. 



A calculation of the correlation coefficient shows the obvious relation 
between the percentages (see Tabled]), except in the case of the presidential 
candidate of P4 which autocorrelation is 0.0786. Other anomalies appear 
with this party and are reported below. Note that not all the autocorrelations 
in table [T] are one, since the percentage is calculated on the variables shown. 
A large variablility in the PREP records exists and is reported in section [31 
However, the autocorrelations of votes received is one in all the cases. A test 
on the statistical sign ificance of th e correlation coefficients can be done using 
a resampling method Good. 2005l |. However we found much more instructive 
to show the behavior of the percentages under resampling. 

We first checked that the final version of PREP file final version repro- 
duces the behavior shown in Figure [TJ Secondly, we plot the PREP data 
sampled in different ways. In Fig. [2] appears the percentage of votes sam- 
pling as is ordered in their final form, by alphabetical order of the state name 
and not by arrival time, for the same cases as in the previous figure. As can 
be seen, the tendency of each state of Mexico is reflected in this plot. A 
clear correlation between the percentage of votes and the way we sort the 
vote certificates is observed. In order to breake down such a correlation, 
we sort the certificates in a random way and recalculate the percentages, 
i.e. considering a uniform random sample from the polling station records. 
This procedure yields a fast convergence to the final values, except for small 
fluctuations, as expected. In Fig. [3t^a) and (b) we show two realizations of 
such a shuffied process, one (Fig. [3t^a)) with the PI candidate winning at the 
beginning and another (Fig. [3](b)) with an initial advantage for the P3 party. 
Hence, we have two different samples of the polling stations taken uniformly 
that present the expected convergence but with different beginnings. 
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Number of records (sorted as PREP file) 



Figure 2: (Color online) Percentage of votes as a function of the number of certificate 
records for the three main presidential candidates. The vertical lines separate between 
different states of Mexico sorted alphabetically. The large change around 20000 records 
corresponds to the capital of Mexico governed by the main party inside P3. 



Since there is no unique way to order the sample (as explained before, 
that depends on several factors), it is possible to sort the data in a way that 
any of the three main candidates has an initial advantage for a small number 
of processed votes. In Fig. [3](c), we select all the certificates where the P2 
candidate won and put them at the beginning of the sample. Clearly, this 
candidate retains his advantage for around 11% of processed votes and loses 
it after that. In Fig. Et^d) we show the result of considering at the end of the 
counting process (for a random shuffle) all the cabins where the candidate 
of P3 party won (~ 40%). Note that until this event happens the canditate 
remains in third place all the time. In other words, if this candidate do not 
win in the polling station he is the third option, with a percentage of votes 
even smaller than P2. 
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Figure 3: (Color online) Percentage of votes for the three main candidates as a function 
of processed vote certificates (%) but shuffling the results in four different ways in order 
to obtain different temporal samples. Blue lines show the percentage of votes for PI. The 
green lines correspond to P2 (we add 13% to this result in (a) and (b)). In orange lines 
are the results for P3. In (a) and (b) we present two realizations of a random shuffling of 
PREP file. Notice that we plot only up to 35% of the total votes. In (c) we present the 
results sorted as in PREP file but with the vote certificates where P2 won places sampled 
at the computation beginning. In (d), the results when vote certificates where P3 won 
were placed at the end of a random shuffled file (the same realization as in (a)). 
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The percentage of votes as a function of the number of records for the 
small parties is plotted in Fig. |H As can be seen, the number of annulled 
votes is more than twice the final difference between the two main candi- 
dates. Another interesting observation from this figure is that the number of 
votes for the non-registered candidates is very large. This means that one of 
the independent candidates could obtain one or more places in the chamber 
since, under the Mexican law, there are some positions, in both chambers, 
proportional to the total number of votes obtained by each party. The same 
is true for the case of annulled votes, which represent ~ 2% of the votes 
counted. From Figs. [2] and H] one can see, directly, that there are strong 
correlations between the parties that obtained large and small percentage 
of votes. The percentage of votes for senators and deputies shows a similar 
behavior to that of the presidential candidates, with the sole exception of the 
P4 party, which shows a number of votes approximately four times larger for 
the chamber positions than those for president. 

3. Conservation laws in elections 

In the cabin certificates several data are recorded: Total number of re- 
ceived ballots at the beginning of the electoral process (Br), number of 
remaining (not used) ballots (Bs), number of voters (V), number of de- 
posited ballots per cabin (Bd) and the number of votes received for each 
party/candidate (Vj, i = PI, P2, P3, P4, P5, non-registered candidates and 
annulled votes). Based on this information, there are some conservation laws 
that can be checked, for self-consistency, in this election. The quantities con- 
sidered are those that were calculated and registered independently by the 
electoral authorities when the votes were counted. In this sense, they are 
non-trivial. For instance, one can check the loss or appearance of ballots. 
In particular, we study the total number of ballots, Br, against the sum of 
the number of remaining ballots, Bs, and the number of voters per cabin, 
V, i.e., Br-(Bs+V). In principle, this number should be zero, but, as seen in 
Fig. [5], the distribution of this quantity is peaked around zero but is neither 
the expected Dirac delta function d{x), nor a Gaussian or a Lorentzian. Data 
on the positive axis mean lost ballots. Data on the negative axis indicate 
appearance of ballots. Zooms in different parts of this distribution show the 
following facts: (1) The PREP preserve this number for only ~ 45% of the 
cabins. This result is unfortunate for the electoral authorities (IFE) since 
it says that PREP reliability is less than 50%. (2) The distribution is not 
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Figure 4: (Color online) Percentage of votes as a function of the number of records for the 
other candidates and annulled votes. The vertical lines separate between different states 
of Mexico sorted alphabetically. Notice the correlation between the results presented here 
and those of figure [D 



completely symmetric. In particular the peak around —250 is higher than 
the peak at 250. (3) There are inconsistencies larger than 750 votes for sev- 
eral cabins. (4) There are peaks at ±10, ±20, . . . and also at ±100, ±200, . . .. 
These peaks, we assume, are related with capture typos. (5) The peak at 
the left side of the distribution shows a different behavior for senators than 
the other cases. This result cannot be statistically understood, since all cer- 
tificates (for president, senators and deputies) are captured in the same way. 
A result like this means that the capture of the data was different for three 
similar processes. (6) The distribution between 1 and 100 decays as a power 
law as can be seen in the insert of figures [5] to [71 

In Figs. [6] and [7] we test other conservation laws, by plotting the distri- 
butions of differences between the total ballots received Br and the sum of 
the ballots remaining Bs plus the ballots deposited in urns Bd per cabin, i.e., 
Br-(Bs±Bd), for the first figure. In the second figure we plot the difference 
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Figure 5: Distributions of the difference between the total ballots received Br and the 
sum of the remaining ballots Bs and the total number of voters V per cabin, Br-(Bs+V). 
Negative values in the horizontal axis mean more votes than ballots and positive values 
mean loss of ballots, but in both cases there is not conservation of the total number of 
received ballots Br per cabin. The inset shows the left branch of the distribution in log-log 
scale. This shows a power law decay. Note the several sharp peaks in the distribution along 
the horizontal axis between the values 10 to 100. We assume that these peaks are related 
with typos. We note that the distributions for senators show a strange behavior compared 
with the corresponding distributions for president and deputies in its left branch. 



between the total ballots received Br and the sum of the ballots remaining Bs 
plus the sum of votes obtained by each political party, including null votes 
J2f Vj, i.e., Br-(Bs-|-^^ Vj). Both distributions should give a delta function 
6{x) in the ideal case. But, as seen in Figure El it is very asymmetric and 
present extreme values around 750 as in Fig. [5l In the insets we show that 
typos are also present and that power laws appear. 

In the previous analysis all the cabins were considered, notwithstanding 
the fact that a selection is r equired, since many of records present alterations 



as was documented in Ref. [Crespo. 20081 ]. However, statistical analyses, like 



the present one are valid, since the universal source of error must be studied 
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Figure 6: Distribution of the difFerence between the total ballots received Br minus the sum 
of the ballots remained Bs plus the ballots deposited in urns Bd by cabin, i.e. Br-(Bs+Bd). 
Notice that the asymmetry between the both branches for the three distributions is similar, 
but it is different from that in previous figure. Again, the left branch of this probability 
distributions shows the sharp peaks associated to capture mistakes. The inset shows both 
branches of the distributions in log-log scale. 



for future references. Other errors, random or systematic, require future 
analysis as well. 

4. Distribution of the votes 

Histograms of the number of electoral cabins (polling stations) with a 
certain number of votes are given in Fig. [HI The results for P1-P3 presidential 
candidates appear in Fig. [Ht^a) and the corresponding results for deputies and 
senators in panels (b) and (c). Panel (d) of this figure corresponds to P4, 
P5, non-registered candidates and null votes for the presidential election. 

Histograms for the votes of P2 change very slowly in the three cases 
(Fig. [Ht^a) to (c)). The tail of these distributions looks exponential (a fit 
is shown in Fig. This is not the case for the distributions of the two 
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Figure 7: (Color online) Distributions for the difference between the total ballots received 
Br minus the sum of the ballots remaining Bs plus the sum of votes obtained by each 
political party, including null votes V^, that is Br-(Bs+^j Vi). The meaning of the 
right and the left distribution branches is similar to that in figure [S] Notice that the three 
distributions are similar. The inset shows both branches of the distributions in log-log 
scale. 



main presidential candidates (Fig. [H](a))- The distribution of the votes for 
PI shows a very different behavior for electoral cabins with less than ~ 40 
votes, since it starts flat. The distribution for P3 is also irregular. It shows 
three different regimes. It appears like a distribution in which realizations 
between 60 and 300 votes are missing. This could be due to two reasons: the 
flrst, is that the data were manipulated; the second, is that the distribution 
of the votes for P3 is comp osed of two or mo re distributions corresponding 



to several voter dynamics [Merlin et al. 2004l |. As an example, a weighted 
sum of two distributions 

P = pPnaisyiivi) , CTi) + (1 - p) Plog~normal{{v2) , (^^2) (l) 

with different centroids, {vi) and (^2), like a Daisy model (see below) and 
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Figure 8: Histograms showing the number of electoral cabins which obtained certain 
number of votes for the three main parties, (a) president, (b) deputies, (c) senators. In 
(d) the results obtained for president by the small parties are given. Note that the P2 
distribution have an exponential tail while the distribution of the small parties show a 
shifted power-law. 



a log-normal could give place to such deformed distributions. Both func- 
tio ns correspond to d ifferent groups of voters, the first one to corpo rate vot- 



Hernandez. 2009[ | and the other to certain proportional elections [Castellano et al. 2007 



ers 

The distributions for the senators and deputies present similar behavior as 
seen in Fig. |H] (b) and (c). In fact, the participation of a uniform group of 
voter in important elections is, a priori, unlikely; so, it is enough to have two 
kind of voters following distribution with different maxima to have such a 
behavior. 

The histograms for the parties with a small number of votes are given 
in Fig. M^d). As seen, all histograms have a similar behavior, except for 
a small numbers of votes. All are shifted power-laws, except for the P4 
presidential candidate which present deviations. The results for deputies 
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and senators for the same parties are similar (not shown). These results 
can be explained with severa l models of cluster gro wth in complex net- 
works |Costa Filho et al. 20061 . ICastellano et al. 2007|. f or instance, and ap- 
pear in other electoral processes Costa Filho et al. 19991 . iHerrmann et al. 2004 
for proportional votes. Behind all these works are the search for a universal 
dynamics in voting processes. We found an inconsistency in the tail of the 
annulled votes. This distribution shows several electoral cabins with more 
than 100 annulled votes. The probability of having such results is negligible, 
so that these results are not statistical. 

Finally, we return to the P2 case. The distribution of votes for this party is 



clearly smooth and a fit using the so called Daisy models [Hernandez et al. 1999 



was performed previously on the final records for mexican elections in 2000, 



2003 and 2006 Hernandez. 2009 



data corresponds to a rank r 



The corresponding fitting for the PREP 
3 model 



for the distribution's central part, and an r = 2 model 



(2) 



P2is) 



3! 



exp( 



-3s) 



(3) 



for the tail. A plot on the normalized distribution of votes appears in Fig- 
ure [9l Here the normalization of votes is performed using a 4th degree poly- 
nomial on windows of 300 and 3000 cabin certificates obtaining a reliable 
average density of events. Notice that no fitting parameter was used, in 
contrast to considering a Brody or a WeibuU distributions. 



5. Conclusions 

We studied the statistical properties of the Mexican elections using the 
Program of Preliminary Electoral Results (PREP) database. We have shown 
that the appearance of the data in real time are not distributed in a random 
way, and that this depends on the way the sample is taken. Evidence of cor- 
relations between parties with large numbers of votes and small numbers of 
votes is evident. There is also evidence of correlation with the annulled votes. 
Quantities that should be conserved, for self-consistency of the records, were 
studied. The distribution of such quantities have two main behaviors, with a 
power law in the central part instead of the expected Gaussian or Lorentzian 
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Figure 9: Unfolded distribution for a randomly shuffled sequence of votes for the P2 
presidential candidate (green line). A daisy model of rank r = 2 and 3 (see text for 
details) are shown in dashed and continuous black lines. 



distribution. Unexplained peaks are present in all the distributions (see fig- 
ures IMZl) We suppose that they are due to typos in the capture of the data. 
The number of records with inconsistencies is around 50% in all cases, for 
the president election and for the upper and lower chambers as well. We also 
have obtained the distributions of votes for the different parties. In partic- 
ular, the distribution of the party that was in power in Mexico during more 
than 70 years behaves smoothly. Daisy models of 2nd and 3rd rank seem to 
fit different parts of the measured distribution. In contrast, the distributions 
of the parties with more votes are more complex and, probably, composed of 
different voters dynamics. Distributions of small parties follow power laws. 
The difference between the first and second place should be larger than the 
error associated to the measurement, in this case, the electoral processes. 
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